Information Extraction from Chemical patents
نویسندگان
چکیده
The development of new chemicals or pharmaceuticals is preceded by an indepth analysis of published patents in this field. This information retrieval is a costly and time inefficient step when done by a human reader, yet it is mandatory for potential success of an investment. The goal of the research project UIMA-HPC is to automate and hence speed-up the process of knowledge mining about patents. Multi-threaded analysis engines, developed according to UIMA (Unstructured Information Management Architecture) standards, process texts and images in thousands of documents in parallel. UNICORE (UNiform Interface to COmputing Resources) workflow control structures make it possible to dynamically allocate resources for every given task to gain best cpu-time/realtime ratios in an HPC environment.
منابع مشابه
Comparing manual and automated extraction of chemical entities from documents
The chemical information landscape is changing rapidly with a yearly increase of over 1 million new compounds and more than 700,000 publications related to chemistry [1]. Exploring the chemical space covered by relevant journals and patents is a crucial step in early stage medicinal chemistry projects. Extracting chemical entities from unstructured text is a complex task and different approache...
متن کاملManaging expectations: assessment of chemistry databases generated by automated extraction of chemical structures from patents
BACKGROUND First public disclosure of new chemical entities often takes place in patents, which makes them an important source of information. However, with an ever increasing number of patent applications, manual processing and curation on such a large scale becomes even more challenging. An alternative approach better suited for this large corpus of documents is the automated extraction of ch...
متن کاملSurvey on Information Extraction from Chemical Compound Literatures: Techniques and Challenges
Chemical documents, especially those involving drug information, comprise a variety of types – the most common being journal articles, patents and theses. They typically contain large amounts of chemical information, such as PubMed-ID, activity classes and adverse or side effects. Techniques are used to extract information from a huge number of documents and it is presented in a useful structur...
متن کاملOverview of the CHEMDNER patents task
A considerable effort has been made to extract biological and chemical entities, as well as their relationships, from the scientific literature, either manually through traditional literature curation or by using information extraction and text mining technologies. Medicinal chemistry patents contain a wealth of information, for instance to uncover potential biomarkers that might play a role in...
متن کاملMining Patents with tmChem, GNormPlus and an Ensemble of Open Systems
The significant amount of medicinal chemistry information contained in patents make them an attractive target for text mining. The CHEMDNER task at BioCreative V focused on information extraction from patents. This manuscript describes our submissions to the CEMP (chemical named entity recognition) and GPRO (gene and related object identification) subtasks. Our CEMP submission is an ensemble of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer Science (AGH)
دوره 13 شماره
صفحات -
تاریخ انتشار 2012